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ABSTRACT 

We  present  a  new  linear  algorithm  for  constructing 
all  strongly  connected  components  of  a  directed  graph, 
and  show  how  to  apply  it  in  iterative  solution  of 
data-flow  analysis  problems,  to  obtain  a  simple  algorithm 
which  improves  the  Hecht-Ullman   algorithm. 


Ill 


1.    Introduction 

We  present  here  a  new  algorithm  for  constructing  all 
strongly-connected  components  of  a  directed  graph.  Like  Tarjan's 
well  known  algorithm  (cf.  [AHU,  Ch.  5.7])  it  uses  a  depth- 
first  spanning  tree  (forest)  T,  and  is  linear  in  the  number 
of  nodes  and  edges  of  the  graph.   However,  our  algorithm 
differs  from  Tarjan's  in  that  it  produces  these  components  in 
reverse  postorder  of  their  roots  (relative  to  T) ,  and  also 
orders  the  nodes  within  each  component  in  reverse  postorder. 
Together,  these  orderings  induce  a  modified  reverse  postorder 
of  the  graph  nodes,  which  facilitates  certain  iterative 
algorithms   related  to  data-flow  analysis.   We  will  describe 
and  analyze  such  a  data-flow  algorithm  which  improves  the 
algorithm  of  Hecht  and  Ullman  [HU] .   Even  though  the  order- 
ing we  are  concerned  with  can  also  be  obtained  with  one 
additional  tree-walk   using  Tarjan's  algorithm,  we  present 
our  algorithm  as  a  simpler  alternative. 

The  strong-connectivity  algorithm  is  presented  and 
analyzed  in  Section  2.   Section  3  discusses  its  applications 
to  the  solution  of  data-flow  analysis  problems. 


2 .    A  Strong-Connectivity  Algorithm 

Let  us  assume  that  we  are  given  a  directed  graph  G 
rooted  at  a  unique  'entry'  node  r.   Let  N  be  the  set  of  nodes 
of  G  and  E  the  set  of  its  edges.  For  each  n  s  N  denote  by 
SCC(n)  the  strongly-oonneoted   aomponent   of  G  containing  n, 
i.e.   the  maximal  set  of  nodes  containing  n  such  that  G 
restricted  to  that  set  is  strongly  connected.   Let  T  be  a 
depth-first  spanning  tree  for  G. 

The  following  algorithm  will  compute  the  following 
objects:   a  list  SCCS  of  roots  of  strongly-connected  compon- 
ents  in  their  reverse  postorder  (relative  to  T)  and  a  map 
SCCNODES  mapping  each  (root  of  a)  strongly  connected 
component  to  a  list  of  its  nodes  in  the  same  reverse  post- 
order.   The  algorithm  proceeds  in  the  following  steps: 

Algorithm  SCOMPS: 

(1)  Initialize  SCCS  to  the  null  list  and  SCCNODES  to  the 
null  map.   Also  initialize  an  auxiliary  map  SCCROOT, 
which  will  map  each  node  in  N  to  the  root  of  its 
strongly-connected  component,  to  the  null  map. 

(2)  Iterate  through  N  in  reverse  postorder.   Let  h  e  N  be 
the  node  currently  visited. 

(3)  If  SCCROOT (h)  is  still  undefined,  then  h  is  the  root  of 
a  new  strongly  connected  component.   In  this  case  we 
compute  the  set 


S(h)  =  {h}  u{{w:  w  is  a  T-descendant  of  h  which  can 

reach  h  along  a  path  consisting  solely  of 
T-descendants  of  h} 

and  extend  the  SCCROOT  map,  by  mapping  all  we  S (h)  to  h. 
The  set  S(h)  is  computed  as  follows  (square  brackets  denote 
ordered  tuples). 

S(h)  :=  [h]; 

NEW  :=  [w:  (w,h)  e  E  and  w  is  a  T-descendant  of  h] ; 
(while  NEW  is  not  empty) 

remove  an  element  w  from  NEW; 

SCCROOT (w)  :=  h; 

add  w   to  S  (h) ; 

add  to  NEW  all  nodes  v  where 

(v,w)  e  E,  V  is  a  T-descendant  of  h 
and  SCCROOT (v)  is  undefined; 
end  while; 

Note  that  we  can  test  the  condition  'v  is  a  T-descendant  of  h' 
rapidly  using  the  formula 

V  is  a  T-descendant  of  h  iff  pre  (h)  <  pre  (v)  <^  pre  (h) +#descendants 

of  h 

(where  pre(x)  denotes  the  preorder  index  of  a  node  x) . 

However,  using  the  special  properties  of  depth-first  spanning 

trees,  we  can  replace  the  above  test  by  the  following  simpler 

test . 


In  the  above  construction  of  S(h),  v  is  a  T-descendant 
of  h  iff  post(v)  •  post(h) 
(where  post(x)  is  the  postorder  index  of  a  node  x). 

Indeed,  to  explain  the  nonobvious  implication,  assume  that 
post(v)  <  post(h).   Then  either  v  is  a  T-descendant  of  h  or 
else  V  is  to  the   left  of  h,  but  only  the  first  case  is 
possible,  because  in  the  second  case  we  would  have  the 
impossible  left-to-right  cross  edge  {v,w) .   (We  are  grateful 
to  Gerald  Fisher  for  this  observation.) 

Having  computed  S(h),  we  add  h  to  SCCS ,  and  set 
SCCNODES (h)  :=  [h] . 

(4)   If  SCCROOT(h)  =  u   is  already   defined, 
we  add   h   to  the  end  of  SCCNODES (u) . 

Before  proving  the  correctness  of  this  algorithm,  we  give 
an  example  illustrating  it. 

Example.   Consider  the  following  flow  graph: 


which  has  the  following  DFST  (we  label  nodes  with  their 
postorder  indices) : 

a. 


The    reverse   postorder    is    then 

abdegj         khiJifc 
We    now    list    the    actions    taken   by   our    algorithm   as    it    iterates 
through    that    sequence: 


Node  Visited  Action 

a  SCCROOT(a)  :=  a;  SCCS  :=  [a]; 

SCCNODES(a)  :=  [a] 
b  SCCROOT(b,c,d,e,f)  :=  b;  SCCS  :=  [a,b]; 

SCCNODES(b)  :=  [b] 
d  SCCNODES(b)  :=  [b,d] 

e  SCCNODES(b)  :=  [b,d,e] 

g  SCCROOT(g)  :=  g;  SCCS  :=  [a,b,g]; 

SCCNODES(g)  :=  [g] 
j  SCCROOT(j,k)  :=  j;  SCCS  :=  [a,b,g,j] 

SCCNODES(j)  :=  [j]; 
k  SCCNODES(j)  :=  [j,k]; 

h  SCCROOT(h,i)  : =  h ;  SCCS  :=  [a,b,g,j,h] 

SCCNODES(h)  :=  [h] 
i  SCCNODES(h)  :=  [h,i] 

a  SCCROOT(!i)  :=  £ ;  SCCS  :=  [a,b,g,j,h,C] 

SCCNODES(£)   ;=  [l] 
f  SCCNODES(b)  :=  [b,d,e,f] 

c  SCCNODES(b)  :=  [b,d,e,f,c] 


Next   we  prove  the  correctness  and  linearity  of 
algorithm  SCOMPS. 

Lemma  1.    For  each  h  processed  by  step  (3)  of  the  algorithm, 
S(h)  is  a  subtree  of  T  rooted  at  h. 

Proof:   If  V  e  S{h)  and  u  is  a  T-ancestor  of  v  and  a 
T-descendant  of  h,  then  obviously  u  will  also  be  added  to  S(h). 

Q.E.D. 

Lemma  2.   For  any  two  nodes   h, jh-   processed  during  step  (3) 
of  the  algorithm,   S(h^)  n  S{h2)  7^  0. 

Proof:   Suppose  that  h,  has  a  higher  postorder  number  than  h- 

(i.e.  h,  is  visited   before  h„).   Suppose  that  there  exists 

w  G  S  (h, )  n  s(h~).   Then  w  is  a  T-descendant  of  both  h,  and  h^  , 

so  that  h,  must  be  a  T-ancestor  of  h^    ,  and  Lemma  1  implies 

then  that  h^   e  S(h,).   But  then  SCCR00T(h2)  will  be  set  to  h^ 

when  h,  is  processed,  so  that  h~  cannot  be  processed  by  step  (3) 

This  contradiction  proves  our  assertion. 

Q.E.D. 

Lemma  3 .   For  each   h   processed  by  step  (3)  we  have 

S(h)  =  SCC(h)  . 

Proof :   Obviously,  S (h)  is  strongly-connected  and  contains  h. 
To  show  maximality,  we  proceed  by  induction  on  the  nodes  in 
reverse  postorder.   Let  h  =  r,  the  first  node  in  reverse 
postorder,  and  suppose  that  there  exists  some  w  ^  SCC(h)-S(h). 


Then  w  is  obviously  a  T-descendant  of  h  from  which  a  path  can 
reach  h,  so  that  w  e  S (h) ,  which  contradicts  our  assumption. 
Next  suppose  the  assertion  to  be  true  for  all  nodes  preceding 
some  node  h   processed  during  step  (3)  of  the  algorithm, 
but  that  SCC(h)  -  S(h)  is   not  empty   and  contains  a  node  w. 
If  w  has  a  higher  postorder  index  than  h,  then  so  has 
V  =  SCCROOT(w),  and  by  the  induction   hypothesis,  SCC(v)  =  S{v). 
However,  this  implies  that   h  e  S (v) ,  and  hence  h  cannot  be 
processed  by  step  (3),  since  SCCROOT(h)  will  already  have 
been  set  to  v  by  the  time  h  is  visited.   Therefore  h  must 
have  a  higher  postorder  index  than  w.   But  then  if  w  and 
all  other  nodes  lying  on  a  path  from  w  to  h  (which  exists 
since   w  G  SCC{h))  are  T-descendants  of  h,  then  w  would  have 
been  placed  in  S(h)  by  definition.   Therefore,  at  least  one 
node  w,  along  such  a  path  must  lie  to  the  left  of  h  or  above  h 
in  the  tree  T.  It  is  easy  to  see  then  that   this  path  must  pass 
through  some  T-ancestor  u  of  h  and  w,  (note  that  u  7^  h)  .  Thus 
u,  h  and  w,   all  belong  to  the  same  strongly-connected 
component  which  then  also  contains  v  =  SCCROOT(u).   But  v  has 
a  higher  postorder  index  than  h,  so  that,  by  the  induction 
hypothesis,  SCC (v)  =  S(v).   Hence   he  S (v) ,  which  implies, 
as  before,  that  h  cannot  be  processed  by  step  (3)  of  the 
algorithm.   Hence   SCC(h)  =  S(h)  and  this  completes  the  proof. 

Q.E.D. 


Theorem  1    Algorithm  SCOMPS  correctly  computes  all  strongly- 
connected  components  of  G  in  the  required  external  and 
internal  orders,  and  takes   0(max  (#N,  #E) )  time. 

Proof :   Lemma  3  asserts  that  each  set  S(h)  constructed  in 
step  (3)  is  a  strongly-connected  component,  and  Lemma  2 
implies  that  no  such  component  is  computed  more  than  once. 
Since  the  union  of  all  these  sets  is  N  (easily  seen  by 
noting  that  at  the  end  of  the  iteration  step  (2)  SCCROOT 
must  be  everywhere  defined) ,  it  follows  that  the  algorithm 
computes  all  strongly  connected  components  of  the  graph. 
Our  claims  concerning  the  external  ordering  of  these 
components  in  SCCS,  as  well  as  the  internal  orderings  of 
their  nodes  as  reflected  in  SCCNODES,  then  follow 
immediately. 

To  analyze  the  time  complexity  of  the  algorithm,  we 
note  that  the  only  section  of  the  algorithm  whose  linearity 
in  max  (#N,  #E)   is  nontrivial   is  the  computation  of  the  sets 
S(h).   Nevertheless,  we  claim  that  in  the  construction  of  the 
NEW  sets,  no  edge  is  considered  more  than  once.   This  is 
because  an  edge  is  considered  only  when  its  target  is  being 
added  to  some  set  S (h) ,  and  by  Lemma  2   this  can  happen 
only  once  during  execution  of  the  algorithm.   For  similar 
reasons,  no  node  is  added  to  any  NEW  set  more  than  once. 
Hence  this  part  of  the  algorithm  is  also  linear  in 
max  (#N,  #E). 

Q.E.D. 
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Next  we  note  some  useful  properties  of  our  ordering 
of  strongly  connected  components  (similar  properties  are 
discussed  in  [Ke]). 

Lemma  4 .   Let   h, ,hp  be  roots  of  different  strongly- 
connected  components,  and  let  u  e  SCC  (h, ) ,  v  g  SCC(h„) 
be  such  that   (u,v)  £  E.   Then  h,  has  a  higher  postorder 
number  than  h„ . 

Proof:   If  not,  then  h„  is  either  a  T-ancestor  of  h,  or 
else   h„  is  to  the  right  of  h-,  .   In  the  first  case   h~  is 
also  a  T-ancestor  of  u  and  we  can  reach   h„  from  u  via 
the  edge  {u,v)  which  enters  SCCCh^).   Hence   u  e  scc(h2), 
a  contradiction.   In  the  second  case   (u,v)  is  an  impossible 
left-to-right  cross  edge.   These  contradictions  prove 
the  lemma. 

Q.E.D. 

Corollary  5.    If  each  strongly-connected  component  of  G 
is  reduced  to  a  single-node,  then  the  reduced  graph  is 
acyclic,  and  is  topological ly  sorted  by  the  reverse  post- 
order   of  the  head  nodes  of  strongly-connected  components. 

Corollary  6.    Every  path  p  in  G  can  be  decomposed  as 

p^  I  I  Pj^  I  I  .  .  .  I  I  p^   ,   where   hj^,...,hj^   are  roots  of 

12  k 

strongly-connected  components  in  reverse   postorder, 

and  where  the  subpath   p.   passes  only  through  nodes  of  SCC(h^) 
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3.    Application  to  Data-Flow  Analysis 

The  above  observations  yield  a  simple  but  useful  improve- 
ment of  the  iterative  data-flow  algorithm  of  Hecht  and 
Ullman  [HU] .   This  algorithm  solves  the  following  set  of 
equations  for  the  data-values  x   attached  to  a  given  flow-graph 
(i.e.  a  rooted  directed  graph)  G  with  root  r  and  set  of  nodes  N, 
where  each   x    belongs  to  a  given  semilattice  L,  the 

functions   f  ,    >   belong  to  a  class  F  of  isotone  maps 
(m,n) 

acting  on  L,  and   x.  6  L  is  an  initial  value. 


(*) 


x   <  x„ 
r  —   0 


x   =  A  {f ,    >  (x  ) :  (m,n)  G  G}  ,    n  e  N  . 
n        (m,n)   m 


The  algorithm  in  [HU]  arranges  nodes  of  N  in  reverse  postorder 
(with  respect  to  a  depth-first  spanning  tree),  and  then 
iterates  through  this  sequence  repeatedly  (starting  with  some 
'largest'  initial  value  of  the  x  's),  applying  (*)  to  obtain 
successive  approximations  of  the  solution,  till  these 
approximations  converge  to  the  maximal  fixpoint  of  (*). 

As  already  noted  in  [Ke] ,  this  approach  suffers  from 
some  obvious  inefficiencies.   For  example,  let  n, , . . . ,n,  be 
the  nodes  of  N  in  reverse  postorder,  and  assume   that  G 
contains  only  one  loop,  consisting  of  the  blocks   n.,...,n  . 
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Then,  obviously,  n  , . . . ,n._,  have  to  be  processed  only  once, 
as  there  is  no  information  'feedback'   to  these  nodes  from 
succeeding  nodes.   Also,  once  information  has  stabilized 
along  that  loop,   n   j^,...,n,   also  need  be  processed  only 
once.   However,  each  iteration  of  the  Hecht-Ullman  algorithm 
would  process  the  whole  sequence  n,,...,n,  . 

We  therefore  propose  to  revise  the  Hecht-Ullman  algorithm 
as  follows: 

(1)   Apply  the  strong-connectivity  algorithm  of  the  previous 

section  to  the  flow-graph  G,  to  obtain  SCCS  and  SCCNODES. 


(2)   Initialize  the  solution  map  x  so  that  x   =  x„  ,  and  for 

^  r     0 

all  n  e  N  -  {r}   put   x   =  fi,   where  f^  is  a  special 
largest   element  of  L   denoting  undefined  data-value 


(3)  Iterate  once  through  all   the  nodes  of  SCCS 
(in  their  reverse  postorder). 

(4)  Let  h  be  the  currently  visited  node. 
Iterate  through  the  nodes  in  SCCNODES (h) 

(i.e.  in  their  relative  reverse  postorder)  repeatedly, 
applying  (*)  to  obtain  successive  approximations  to 
the  solution  as  in  the  Hecht-Ullman  algorithm,  till 
information  stabilizes  for  all  these  nodes. 
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Theorem  2 .   The  above  algorithm  converges  if  the  semilattice 
L  is  well-founded,  and  yields  the  maximal  fixpoint  of  (*). 

Proof:   Convergence  is  proved  in  a  standard  manner  by  noting 

that  the  successive  values  of  the  solution  map  x  form  a 

N 
nonincreasing  sequence  m  L    with  only  finitely-many  repeti- 
tions. Since  we  compute  x  by  successive  approximations, 
starting  with  an  initial  value  which   is  larger  than  any 
solution  of  (*),  it  is  sufficient  to  show  that  the  final 
value  of  x  is  a  solution  of  (*) ,  to  deduce  that  it  is  the 
maximal  fixpoint  solution.   To  see  this,  let  n  e  N  be  a  node 

for  which   x  -^     A  {f  ,    ,  (x  )  :  (m,n)  s  e}   when  the  algorithm 
n        (m,n)   m      '  ^ 

terminates.   Let  h  be  the  root  of  the  strongly-connected 
component  containing  n.   By  Lemma  4,      each  predecessor 
m  of  n  either    also  belongs  to  SCC  (h)  ,  or  else  to  SCC  (h ' ) 
for  some  h'  preceding  h  in  SCCS.   This  implies  that  the  final 
values  of  X  at  n  and  its  predecessors  are  the  same  as  these 
values  at  the  end  of  processing  SCC (h)  in  step  (4).   But 
these  values  must  satisfy   (*),  which  contradicts  our 
assumption.  Q.E.D. 

Remarks .   (1)  Step  (4)  can  impose  an  a  priori  upper  bound  on 
the  number  of  iterations  required  for  each  strongly-connected 
component  S,  provided  that  the  data-flow  analysis  is  of  the 

'bitvectoring '  type  (such  as  available  expressions  analysis), 
or,  more  generally,  has  a  framework  of  the  kind  described  in 

[KU].   As  in  [KU],  d   +  1  iterations  suffice,  where  d   is  the 
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maximal  number  of  back  edges  along  any  acyclic  path  in  S. 
For  practical  purposes,  this  bound  can  be  estimated  by 
another  bound  d'  +  1,  where  d'  is  the  number  of  back-edge 
target  nodes  in  S.   These  quantities  can  easily  be  computed 
during  the  execution  of  the  strong-connectivity  algorithm 
SCOMPS,  and   can  then  be  used  to  limit  the  number  of 
iterations  required  in  step  (4)  of  the  algorithm. 

(2)   It  follows  from  the  last  remark  that  if  S,,...,S,  are 
the  strongly-connected  components  of  G  in  the  reverse 
postorder  of  their  roots,  then  the  sequence 

k 

S  =  I       (d    +  1)  *  S. 
i=l    ^i  ^ 

(where  summation  corresponds  to  tuple  concatenation  and 
multiplication  by  an  integer  to  tuple  replication)  is  a 
(strong)  node-listing    for  G  in  the  terminology  of  [Ke] . 
This  node  listing  represents  a  good  compromise  between  the 
coarse  Hecht-Ullman  approach  and  more  sophisticated,  but 
more  intricate  node-listing  methods.  Indeed,  in  the  light 
of  the  comments  in  [Ke] ,  it  is    clear  that  our  algorithm 
can  be  regarded  as  essentially  using  a  crude  node-listing 
which  corresponds  to  that  described  by  the  standard 
Hecht-Ullman  iteration  for  each  strongly-connected  component 
of  the  graph,  and  as  combining  these  listings  in  the  way 
suggested   in  [Ke] . 
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(3)   Applications  to  interprocedural  data-flow  analysis 
of  the  attribute-analysis  algorithm  described  above  and  of 
the  strong-connectivity  algorithm   are   given  in  [SS]. 


16 


References 

[AHU]   Aho,  A,  U.,  Hopcroft,  J.  E.  and  Ullman,  J.  D., 

"The  Design  and  Analysis  of  Computer  Algorithms," 
Addison-Wesley ,  1974. 

[HU]    Hecht,  M.  S.  and  Ullman,  J.  D., 

"A  Simple  Algorithm  for  Global  Data  Flow  Analysis 
Problems,"  SIAI'^  J.  Computing  4  (1975)  519-532. 

[KU]    Kam,  J.  B.,  and  Ullman,  J.  D., 

"Global  Data  Flow  Analysis  and  Iterative  Algorithms," 
JACM  23  (1976)  158-171. 

[Ke]    Kennedy,  K.  VI., 

"Node  Listings  Applied  to  Data-Flow  Analysis," 
Proc.  2nd  POPL  Conference  (1975)  10-21. 

[SS]    Schwartz,  J.  T, ,  and  Sharir,  M., 

"A  Design  of  Optimizations  of  the  Bitvectoring  Class," 
Courant  Institute  Tech.  Rept . ,  1979  (to  appear). 


17 


This  book  may  be  kept 

FOURTEEN    DAYS 

A  fine  will  be  charged  for  each  day  the  book  is  kept  overtinie. 

GAVLORD    142 

PRINTED  IN  U   S   A 

r 


NYU 
Comp . Sci .Dept . 

TR-0li4 

Sharlr 

A    strong-connectivity 

algorithm  and   its    ... 


C.2 


Com; 
Sharir 


.Sci , Dept 


c.2 


AUTHOR 

A  strong-connectivity 


TITLE 

algorithm  and  its  ... 


DATE   DUE 


BORROWER  S   NAME 


N.Y.U.  Courant  Institute  of 
Mathematical  Sciences 

251  Mercer  St. 
New  York,  N.  Y.    10012 


